Skip to content

Change BFT Sync to process block responses as expected#3615

Closed
kaimast wants to merge 1 commit intoProvableHQ:stagingfrom
kaimast:validator-advance-block-sync
Closed

Change BFT Sync to process block responses as expected#3615
kaimast wants to merge 1 commit intoProvableHQ:stagingfrom
kaimast:validator-advance-block-sync

Conversation

@kaimast
Copy link
Contributor

@kaimast kaimast commented Apr 28, 2025

Motivation

Stress testing revealed an error in the recent changes to block synchronization. When validators fall behind they do not sync block properly. Meaning, the sync blocks like clients, and do not add the contained certificates to the BFT module.

This PR provides a fix for that by modifying Sync::advance_with_sync_blocks and adds more documentation to the Sync and BlockSync structs.

It also renames two functions to make their functionality clearer:

  • Sync::sync_storage_with_blocks was changed to Sync::try_advancing_block_synchronization to make it obvious that this is the validator version of BlockSync::try_advancing_block_synchronization.
  • Sync::try_block_sync was changed to Sync::issue_block_requests because it only create and sends them, but does not process any block responses.

Finally, the PR adds function Sync::try_block_sync was added that encapsulates a full iteration of a validator's block synchronization. This function is invoked manually in some unit tests (mostly in Primary's tests).

Test Plan

We will run more stress tests before merging this into staging (thanks to @kpandl for doing most of the work here).

Related PRs

#3543 introduced the bug that this PR fixes.

@kaimast kaimast force-pushed the validator-advance-block-sync branch from 34d520d to 14bd53d Compare April 28, 2025 17:20
@kaimast kaimast requested a review from raychu86 April 28, 2025 17:20
@kaimast kaimast force-pushed the validator-advance-block-sync branch from 14bd53d to 592672c Compare April 28, 2025 17:22
@raychu86
Copy link
Collaborator

Changes make sense, as we do not want the validator nodes to call the sync functions of the internal block_sync, but rather use higher level wrapper logic.

Couple questions:

  1. Is renaming sync_storage_with_blocks really more clear? There is now two instances of try_advancing_block_synchronization and they can't be clearly identified between validator and client impls without reading the documentation directly. The name doesn't really provide context that it is for validators.
  2. Is there a way for us to prevent future mistakes of calling the internal block_sync functions if we are on the validator level?

@kaimast kaimast closed this May 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants